Rule-based Dependency Parse Collapsing and Propagation for German and English
نویسندگان
چکیده
We present a flexible open-source framework that performs dependency parsing with collapsed dependencies. The parser framework features a rule-based annotator that directly works on the output of a dependency parser. Thus, it can introduce dependency collapsing and propagation (de Marneffe et al., 2006) to parsers that lack this functionality. Collapsing is a technique for dependency parses where words, mainly prepositions, are elevated into the dependency relation name. Propagation assigns syntactic roles to all involved items in conjunctions. Currently, only the Stanford parser features these abilities for the English language. Here we introduce a rule-based collapsing engine that can be applied on top of the output of a dependency parser and that was used to re-engineer the rules of the English Stanford parser. Furthermore, we provide the first dependency parser with collapsing and propagation for German. We directly compare our collapsing for English with the one from the Stanford parser. Additionally, we evaluate collapsed and non-collapsed syntactic dependencies extrinsically when used as features for building a distributional thesaurus (DT).
منابع مشابه
ارائۀ راهکاری قاعدهمند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساختسازهای برای زبان فارسی
In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...
متن کاملDeep Dependencies from Context-Free Statistical Parsers: Correcting the Surface Dependency Approximation
We present a linguistically-motivated algorithm for reconstructing nonlocal dependency in broad-coverage context-free parse trees derived from treebanks. We use an algorithm based on loglinear classifiers to augment and reshape context-free trees so as to reintroduce underlying nonlocal dependencies lost in the context-free approximation. We find that our algorithm compares favorably with prior...
متن کاملA Domain-Restricted, Rule Based, English-Hindi Machine Translation System Based on Dependency Parsing
We present a domain-restricted rule based machine translation system based on dependency parsing. We replace the transfer phase of the classical analysis, transfer, and generation strategy with a syntax planning algorithm that directly linearizes the dependency parse of the source sentence as per the syntax of the target language. While we have built the system for English to Hindi translation,...
متن کاملIs it Really that Difficult to Parse German?
This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big diff...
متن کاملTreebank Annotation Schemes and Parser Evaluation for German
Recent studies focussed on the question whether less-configurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by Kübler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this pape...
متن کامل